Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: Kokonor Date: Nov 16, 2013 8:13pm
Forum: texts Subject: dead links / downloading pdf files

Thanks a bunch!

Reply to this post
Reply [edit]

Poster: aibek Date: Nov 18, 2013 6:27am
Forum: texts Subject: Re: dead links / downloading pdf files

As far as I understand, for some reason, Wayback Machine almost always got (or saved?) incomplete files. I think only the following two files are present in full. (Are these the two files you already have?)

http://web.archive.org/web/20081209131248/http://www.thdl.org/community/pdfs/schoolrep.pdf
http://web.archive.org/web/20061119001955/http://www.thdl.org:80/community/pdfs/DhikgoSolarRep.pdf

You may also try the following links, but I am almost certain that all of them will fail with incomplete downloads.

What exactly are you looking to do? Have you considered writing to the maintainers of the new site for a copy?

http://web.archive.org/web/20080517054713/http://www.thdl.org/community/pdfs/BridFundComputerProjRep.pdf
http://web.archive.org/web/20080517050056/http://www.thdl.org/community/pdfs/britishwaterrep.pdf
http://web.archive.org/web/20080517054427/http://www.thdl.org/community/pdfs/BrownWaterRep.pdf
http://web.archive.org/web/20080517045433/http://www.thdl.org/community/pdfs/canadathuwatsangschool.pdf
http://web.archive.org/web/20080517045811/http://www.thdl.org/community/pdfs/canadawaterirrigation.pdf
http://web.archive.org/web/20080517054319/http://www.thdl.org/community/pdfs/canadianwaterrep.pdf
http://web.archive.org/web/20080517050221/http://www.thdl.org/community/pdfs/coatprojectrep.pdf
http://web.archive.org/web/20080517045611/http://www.thdl.org/community/pdfs/DanmaSchoolRep.pdf
http://web.archive.org/web/20080517050248/http://www.thdl.org/community/pdfs/donglakacisternrep.pdf
http://web.archive.org/web/20080517053540/http://www.thdl.org/community/pdfs/duoraschoolrep.pdf
http://web.archive.org/web/20080517045734/http://www.thdl.org/community/pdfs/germansolarcooker2.pdf
http://web.archive.org/web/20080517045524/http://www.thdl.org/community/pdfs/germansolarcooker3.pdf
http://web.archive.org/web/20080517054254/http://www.thdl.org/community/pdfs/germansolarcookerrep.pdf
http://web.archive.org/web/20080517053900/http://www.thdl.org/community/pdfs/guinanlibrary.pdf
http://web.archive.org/web/20080517054908/http://www.thdl.org/community/pdfs/healthcarereport.pdf
http://web.archive.org/web/20080517054926/http://www.thdl.org/community/pdfs/honriwaterrep.pdf
http://web.archive.org/web/20080517053748/http://www.thdl.org/community/pdfs/IrrigationRep.pdf
http://web.archive.org/web/20080517045919/http://www.thdl.org/community/pdfs/karangschoolrep.pdf
http://web.archive.org/web/20080517054038/http://www.thdl.org/community/pdfs/KhasoWaterRep.pdf
http://web.archive.org/web/20080517054615/http://www.thdl.org/community/pdfs/ladeschoolreport.pdf
http://web.archive.org/web/20080517045655/http://www.thdl.org/community/pdfs/LarimaSolarPanelAndLibraryRep.pdf
http://web.archive.org/web/20080517050343/http://www.thdl.org/community/pdfs/LeduBridgeRep.pdf
http://web.archive.org/web/20080517054201/http://www.thdl.org/community/pdfs/ledulibraryreport.pdf
http://web.archive.org/web/20080517054228/http://www.thdl.org/community/pdfs/literacyprojectrep.pdf
http://web.archive.org/web/20080517050024/http://www.thdl.org/community/pdfs/LosangYakaVillageReport.pdf
http://web.archive.org/web/20080517053934/http://www.thdl.org/community/pdfs/MahonSolarCookerRep.pdf
http://web.archive.org/web/20080517054006/http://www.thdl.org/community/pdfs/MillProjectRep.pdf
http://web.archive.org/web/20080517054449/http://www.thdl.org/community/pdfs/muhongirrigationreport.pdf
http://web.archive.org/web/20080517053626/http://www.thdl.org/community/pdfs/muhongpigstyreport.pdf
http://web.archive.org/web/20080517050153/http://www.thdl.org/community/pdfs/navisolarcookers.pdf
http://web.archive.org/web/20080517053346/http://www.thdl.org/community/pdfs/newspapereport.pdf
http://web.archive.org/web/20080517045951/http://www.thdl.org/community/pdfs/newzealandwater.pdf
http://web.archive.org/web/20080517053710/http://www.thdl.org/community/pdfs/nigasolarpanelsreport.pdf
http://web.archive.org/web/20080517054555/http://www.thdl.org/community/pdfs/NyamoVilSolCookRep.pdf
http://web.archive.org/web/20080517054634/http://www.thdl.org/community/pdfs/rinosolarcookerrep.pdf
http://web.archive.org/web/20080517054811/http://www.thdl.org/community/pdfs/rizangschoolreport.pdf
http://web.archive.org/web/20080517053450/http://www.thdl.org/community/pdfs/SajiShrineRep.pdf
http://web.archive.org/web/20080517054406/http://www.thdl.org/community/pdfs/SchoolLibraryRep.pdf
http://web.archive.org/web/20080517054850/http://www.thdl.org/community/pdfs/SecondHandClothesRep.pdf
http://web.archive.org/web/20080517054943/http://www.thdl.org/community/pdfs/SolarCookerHonriVillage.pdf
http://web.archive.org/web/20080517053824/http://www.thdl.org/community/pdfs/solarrep.pdf
http://web.archive.org/web/20080517054654/http://www.thdl.org/community/pdfs/solomonenglishtraining.pdf
http://web.archive.org/web/20080517054733/http://www.thdl.org/community/pdfs/SumbaWaterReport.pdf
http://web.archive.org/web/20080517054135/http://www.thdl.org/community/pdfs/thurstonsolarcooker.pdf
http://web.archive.org/web/20080517050126/http://www.thdl.org/community/pdfs/thuwatsangschool.pdf
http://web.archive.org/web/20080517054959/http://www.thdl.org/community/pdfs/trakmarschoolrep.pdf
http://web.archive.org/web/20080517054343/http://www.thdl.org/community/pdfs/tuttlesolarcookersrep.pdf
http://web.archive.org/web/20080517054107/http://www.thdl.org/community/pdfs/WagaGongmaWaterRep.pdf
http://web.archive.org/web/20080517045846/http://www.thdl.org/community/pdfs/xerthangwaterreport.pdf
http://web.archive.org/web/20080517050315/http://www.thdl.org/community/pdfs/xianaosolarpanelsreport.pdf
http://web.archive.org/web/20080517054752/http://www.thdl.org/community/pdfs/YakLoanRep.pdf
http://web.archive.org/web/20080517054533/http://www.thdl.org/community/pdfs/YamaTashiKhyilSolCook.pdf
http://web.archive.org/web/20080517054831/http://www.thdl.org/community/pdfs/zhangchujoirrigationreport.pdf



This post was modified by aibek on 2013-11-18 14:27:20

Reply to this post
Reply [edit]

Poster: Kokonor Date: Nov 18, 2013 8:56am
Forum: texts Subject: dead links / downloading pdf files

Thank you.

I'm looking for the complete files to download. I was responsible for the original files, which I have now lost over time. I don't know where else to find them.

I was able to download, as you indicate above, only the two files that you mention above.

I would be very grateful if I could download all the files above for reposting on archive.org (and elsewhere), although I realize that may be impossible.

Thank you.

Koknor

Reply to this post
Reply [edit]

Poster: aibek Date: Nov 18, 2013 3:53pm
Forum: texts Subject: Re: dead links / downloading pdf files

Hello, I asked because there are many complete pdf files available outside the /community/pdfs/ section. Are you interested in files such as the following?

http://www.thdl.org:80/collections/history/texts/Guru_tashi.pdf
http://www.thdl.org:80/texts/reprints/ebhr/EBHR_20-1.pdf
http://www.thdl.org:80/texts/commdev/ngo-series/ngo-03.pdf
http://www.thdl.org:80/education/english/folktales.pdf
e.g.
http://web.archive.org/web/20051127033510/http://www.thdl.org:80/texts/reprints/kailash/1_1_cover.pdf

I will look some more for the /community/pdfs/ files, and tell you if I found any.

Reply to this post
Reply [edit]

Poster: Kokonor Date: Nov 18, 2013 8:40pm
Forum: texts Subject: dead links / downloading pdf files

Thank you.

I'm aware of the .pdf(s) above.
I just checked, e.g., http://www.thdl.org/texts/commdev/ngo-series/ngo-03.pdf and got sent here
http://www.thlib.org/texts/commdev/ngo-series/ngo-03.pdf

with this message:

In August 2008, THL relaunched its new site with new tools, approaches, and data. We encourage you to go to www.thlib.org and use the search and navigation there to try to locate the resources you are looking for. It may be that the resources are under revision and will not be re-released until later. If you can’t find what you are looking for, please contact us.

Read more: http://www.thlib.org/texts/commdev/ngo-series/ngo-03.pdf#ixzz2l3zSLq8t\

But what I really would like to have are the /community/pdfs/ files. At this point, I don't know how to get copies.

Thanks again.

Sincerely,

Koknor

Reply to this post
Reply [edit]

Poster: aibek Date: Nov 19, 2013 1:39am
Forum: texts Subject: Re: dead links / downloading pdf files

I meant to say that those files are present; the first four are not the Wayback Machine links, of course.
The link for the ngo-03.pdf file is:
http://web.archive.org/20070317030705/http://www.thdl.org:80/texts/commdev/ngo-series/ngo-03.pdf

Reply to this post
Reply [edit]

Poster: Kokonor Date: Nov 19, 2013 2:46am
Forum: texts Subject: dead links / downloading pdf files

Thank you. It is my understanding that the Wayback Machine did not save versions of the .pdf(s) I would like copies of. Thank you if you might suggest some other service/ program that might be available and that crawls the web and makes copies available.

Sincerely,

Koknor

Reply to this post
Reply [edit]

Poster: aibek Date: Nov 19, 2013 3:40am
Forum: texts Subject: Re: dead links / downloading pdf files

Hello

I found one more:
http://web.archive.org/20091229085355/http://old.thdl.org:80/community/pdfs/honriwaterrep.pdf

1) I was thinking of something like this:
https://chrome.google.com/webstore/detail/web-cache/coblegoildgpecccijneplifmeghcgip

I tried it for a couple of community/pdfs/ but I could find nothing.

2) You could try searching for just the name of the pdf file. This may turn up useful results. For example, on searching for ‘honriwaterrep.pdf’ on Google, I was led to the following page
http://comments.gmane.org/gmane.education.english.teflchina.jobs/4234
where I found another url for the file, and Wayback Machine had a copy of this file. (the one quoted on the top of this post.)

(Please note that there is no point in trying the Wayback Machine for archives of old.thdl.org. I have already checked that Wayback Machine has archived only the above mentioned file from old.thdl.org/community/pdfs/.)

3) Another thing you could try is the following. The sites containing the pages in the result may have a “local copy” of the pdfs you are looking for.
https://www.google.com/search?q=thdl.org%2Fcommunity%2Fpdfs%2F
So, e.g., the top result is the LukeWater file. You may try visiting that file, and hope that it has something useful! Perhaps an alternate link, or a “local copy”, or just an extracted text copy.

4) Finally, you could try writing to relevant mailing lists, etc, asking people to check if they have the desired files. Also, if you have the hard drives which had the files, you could try recovering them.

Do let me know how the search progresses! Just reply to any post by me and I would get an email notification.

---
General stuff:
https://en.wikipedia.org/wiki/Wikipedia:Link_rot#Repairing_a_dead_link

This post was modified by aibek on 2013-11-19 11:40:21

Reply to this post
Reply [edit]

Poster: aibek Date: Nov 19, 2013 5:18am
Forum: texts Subject: Re: dead links / downloading pdf files

By the way, the following is how I have been searching for the files.

The documentation:
https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server

Please do CDX queries responsibly. (These are highly resource intensive.)

All the files archived from www.thdl.org/community/pdfs/:
http://web.archive.org/cdx/search/cdx?url=http://www.thdl.org/community/pdfs/&;matchType=prefix&limit=1000&
output=json

You can construct the Wayback Machine links by the following formula. (Both timestamp and original without the double-quotes of course.)
web.archive.org/timestamp/original


All the files archived from old.thdl.org/community/pdfs/:
http://web.archive.org/cdx/search/cdx?url=http://old.thdl.org/community/pdfs/&;matchType=prefix&limit=1000&
output=json

The field ‘original’ contains the full url, so using regex I am checking if a file called ‘schoolrep.pdf’ is found at any location on thdl.org:
http://web.archive.org/cdx/search/cdx?url=thdl.org&;matchType=host&output=json&limit=50&filter=original:.*schoolrep.pdf

Using regex, I search for all pdf files with size between 1MB and 10MB:
http://web.archive.org/cdx/search/cdx?url=thdl.org&;matchType=host&output=json&limit=50&filter=length:.......&filter=mimetype:application/pdf

(Note that in the above, limit is set to 50, so it returns only the first 50 entries. No point in wasting IA resources by asking it to run pointless errands!)

Note also that the ‘length’ field does not exactly correspond to the file size. I don’t know what it is, but I know that it is always approximately equal to the file size that IA has.

This post was modified by aibek on 2013-11-19 13:18:55